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Small effective population sizes and active inbreeding can lead to inbreeding 
depression due to deleterious recessive mutations exposed in the homozygous 
state. The Thoroughbred racehorse has low levels of population genetic diver- 
sity, but the effects of genomic inbreeding in the population are unknown. 
Here, we quantified inbreeding based on runs of homozygosity (ROH) using 
297 K SNP genotypes from 6128 horses born in Europe and Australia, of 
which 13.2% were unraced. We show that a 10% increase in inbreeding 
(Fron) is associated with a 7% lower probability of ever racing. Moreover, a 
ROH-based genome-wide association study identified a haplotype on 
ECA14 which, in its homozygous state, is linked to a 32.1% lower predicted 
probability of ever racing, independent of Froy. The haplotype overlaps a 
candidate gene, EFNAS, that is highly expressed in cartilage tissue, which 
when damaged is one of the most common causes of catastrophic musculo- 
skeletal injury in racehorses. Genomics-informed breeding aiming to reduce 
inbreeding depression and avoid damaging haplotype carrier matings will 
improve population health and racehorse welfare. 


1. Introduction 


Thoroughbred horses are valuable domestic animals bred for competitive 
racing [1-3]. The small number of founders [4] and human-mediated selection 
that disproportionately favours the preservation of popular sire-lines [5] has led 
to an inbred population with a small effective population size [6,7]. Purposeful 
inbreeding is common, with breeders attempting to leverage beneficial genetic 
variants by choosing mates with shared ancestors such that multiple ancestors 
may be duplicated in a five-generation pedigree. A recent trend of increasing 
inbreeding has been observed in the population [5]. Inbreeding commonly 
leads to inbreeding depression, the reduced fitness of individuals due to dele- 
terious recessive (or partially recessive) mutations exposed in the homozygous 
state [8,9]. In livestock, for every 1% increase in (pedigree-estimated) inbreed- 
ing, there is an estimated 0.13% decrease in the mean selected trait value [10]. 

Runs of homozygosity (ROH), long stretches of the genome identically 
inherited from both parents, can occur as the result of active inbreeding or 
other demographic processes that reduce effective population size [11]. The pro- 
portion of the genome covered by ROH affects production traits in livestock 
populations [12,13]. Generally, shorter ROH reflect distant inbreeding resulting 
from a common ancestor many generations back in the pedigree, whereas long 
ROH reflect a more recent common ancestor [14]. Mutations in long ROH are 
expected to be more harmful (deleterious) than those in short ROH since 
there have been fewer generations for purging to occur. In support of this, 
long ROH are associated with stronger inbreeding depression than short 
ROH in wild Soay sheep [15] and in humans, deleterious variants are enriched 
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in long ROH [16]. However, in cattle deleterious variants are 
enriched in short and medium ROH, suggested to be due to 
hitchhiking with selected beneficial alleles [17]. 

In Thoroughbreds, a lower-than-expected mutational load 
has led to the hypothesis that existing selection processes are 
effective in purging deleterious mutations [7]. This may be 
facilitated by the unusually large census population size rela- 
tive to the effective population size [5]; for instance, although 
racing is the breeding goal, a high proportion (35-50%) of 
foals born never race [18-22] and a small proportion enter 
the breeding population (less than 5% of males and 50% of 
females). While this selection may purge large-effect mutations 
it is expected that even at low frequencies, segregation of 
deleterious alleles within a population with a low effective 
population size will have consequences for some aspects of 
population viability [8] and inbreeding depression is likely to 
be persistent even with ongoing purging [15]. An analysis of 
pedigree-based inbreeding (Wright’s inbreeding coefficient, 
F) in the Thoroughbred found a strong negative relationship 
between inbreeding and racing performance measures [23], 
which concluded that there is still considerable genetic load 
within the population. 

To some degree, inbreeding is unavoidable in closed 
breeding populations [24]. Understanding how inbreeding 
may affect individual and population fitness is critical to 
improve breeding management and, for the Thoroughbred, 
may have both economic and welfare implications. For Thor- 
oughbred breeders, the breeding goal is to produce viable 
foals that will have productive racing careers and establish 
their value by winning races. The number of races that an 
individual horse participates in and a horse’s career duration 
are considered key indicators of animal health [25]. At a 
population level, it has been proposed that the success 
of the Thoroughbred industry may be measured by the 
proportion of horses born that commence racing [22]. 

Here, we used genome-wide SNP genotypes in a large 
cohort of Thoroughbred horses to examine the effects of 
genome-wide inbreeding on the probability of ever racing. 
Then, to identify regions of the genome containing large-effect 
loci, we performed a ROH-based genome-wide association 
analysis for the probability of racing in the population. 


2. Methods 
(a) Phenotypes, genotype filtering and imputation 


SNP genotypes, race region and year of birth were available for 
n=8951 Thoroughbred horses. Race records (up to the end of 
the 2020 racing season) were retrieved for n=6128 of these 
horses that were born in Europe (EUR) and Australia and New 
Zealand (ANZ) prior to and including 2015, and were therefore 
at least five years old. Among horses that race, the majority 
have their first start before they are five years old [22], and the 
median age of retirement from racing has been reported as 
five years old [19]. Horses were assigned as ‘raced’ (n = 3038, 
EUR, n=2282 ANZ) if they had at least one start before 
five years old or ‘unraced’ (n= 606 EUR, n=202 ANZ) if they 
had no recorded race start before five years old. Race records 
for the major race regions (Europe, Australia and North America) 
were used to partition samples into the two cohorts searching all 
regions including other than birth region. We cannot, however, 
rule out that horses categorized as ‘unraced’ may have raced in 
(minor) regions of the world that were not searched. 


Two SNP genotyping platforms were used to genotype | 2 | 


the animals; n = 4933 were genotyped on the Illumina Equine 
SNP70 BeadChip (Illumina, San Diego, CA) comprising approxi- 
mately 70 000 SNPs (SNP70) and n = 4018 were genotyped on the 
Axiom Equine Genotyping Array (Axiom MNEC670) (Affyme- 
trix, Santa Clara, CA) comprising approximately 670000 SNPs 
(SNP670). All samples had a call rate greater than 95%. Samples 
genotyped on the SNP70 array were imputed up to 488 576 SNPs 
with BEAGLE v. 5.2 [26] using the samples genotyped on the 
SNP670 genotyping platform as a reference set. See electronic 
supplementary material, tables S1 and S2 for full details of the 
analysis cohorts. 


(b) Post-imputation filtering, quality control and 


population structure 

After imputation, we discarded SNPs with a Beagle dosage R? 
less than 0.8 to remove poorly imputed SNPs. We then filtered 
for SNPs with call rates greater than 0.99, minor allele frequency 
greater than or equal to 0.01 and retained only SNPs located on 
autosomes, leaving a final dataset with 296 691 SNPs. Based on 
this dataset, we conducted a principal component analysis in 
PLINK v1.90b [27], which showed only minor population strati- 
fication between EUR and ANZ horses in our dataset (electronic 
supplementary material, figure S1). 


(c) Quantifying runs of homozygosity and inbreeding 
(Frou) 


We called ROH with a minimum length of 300 kb using -homozyg in 
PLINK v1.90b [27] with the following parameters: -homozyg- 
window-snp 30 -homozyg-snp 30 -homozyg-kb 300 -homozyg- 
gap 150 -homozyg-density 100 -homozyg-window-missing 1 
-homozyg-window-het 1. We chose 300 kb as the minimum ROH 
length as we were interested in evaluating fitness effects of both 
shorter and longer ROH. Moreover, based on the SNP density and 
horse genome size, we expect around 40 SNPs in a 300 kb stretch 
of ROH, which should be sufficient to reliably call short ROH. 

Inbreeding coefficients (Frow) were calculated by summing 
the total length of ROH for each individual and dividing by 
the autosomal genome length of 2281 Mb [28,29]. Shorter ROH 
with most-recent common ancestors from further back in the pedi- 
gree might differ in their fitness effects compared to long ROH 
[15-17]. We therefore also calculated an inbreeding coefficient for 
ROH less than 5 Mb (Frow short) and for ROH > 5 Mb (Fron _tong)- 
While the cutoff is semi-arbitrary, ROH of length 5 Mb are expec- 
ted to have a common ancestor haplotype approximately 10 
generations ago, calculated as Ine with g being the number of 
generations [30], assuming a uniform recombination map such 
that 1 Mb is equivalent to 1 cM. The effects of FROH_long and 
Fron short can therefore be broadly interpreted as effects of more 
recent versus older inbreeding, or more precisely, of younger 
versus older haplotypes. 


(d) Modelling inbreeding effects on fitness to race 

We modelled the effects of inbreeding on racing using general 
linear mixed models in Ime4 [31]. A horse was assigned 0 if it 
had never raced and 1 if it had raced, and we therefore used a 
binomial error distribution with logit link with the following 
model structure: 


Pr(raced; =1)= logit”! (Bo + Fron, Bi + region; By + sex; B3 + a) 


ag 9 NO, OFrityear)s for K=1,...,38. 


The probability of having raced Pr(raced;= 1) was modelled 
with an intercept By and fixed effects for individual inbreeding 
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Figure 1. Predicted probability (and 95% confidence intervals) of racing for different inbreeding coefficients (Fro). (a) Predictions shown alongside raw data (horses 
that have raced at 100% and those that have not raced at 0%). (b) Close-up of the relevant plotting area shown in (a). 


Fron, region (ANZ and EUR) and sex (EF M). We also fitted 
birth year as a random effect in the model. The sample size 
was 1= 6128, with 2484 and 3644 horses from ANZ and EUR, 
respectively. To disentangle effects of short and long ROH, we 
re-fitted the model and replaced Fron with two predictors 
E ROH_short and F ROH_long: 


(e) Mapping loci underpinning inbreeding depression in 
fitness to race 


To detect potential large-effect loci involved in inbreeding 
depression and to distinguish these from additive SNP effects, 
we performed a ROH-based genome-wide association study 
(GWAS) [13,15]. At every SNP location, we fitted a binomial 
mixed model with logit link using Ime4 with the following 
structure: 


Pr (raced; = 1) = logit”! (By + SNPROH nai 
Tr SNPROF pei P2 ar SNPapp, Bs 
+ FROF pti B4 + region;Bs + sexi Bg + Br_y7PCi-10 


birth year birthyear 
aa a a, N(O, Obirth year 


), fork=1,...,38. 


The predictors of interest are SNPROH ie.4i ANA SNPROH sis 
which are binary predictors quantifying whether an individual 
had a ROH overlapping a given SNP position with allele A as 
the homozygous genotype (SNPRoHyy.,i = 1) or with allele B as 
the homozygous genotype (SNPROH psi = 1). When the SNP 
was not in a ROH it was coded as 0. These predictors indicate 
whether ROH have an effect on the probability of racing at a 
given position in the genome. The reason for fitting two ROH 
predictors is that a large-effect deleterious allele is likely to 
occur only on certain haplotypic backgrounds. To ensure that 
a ROH effect is not simply an additive effect, we also fitted a 
predictor with the SNP genotype coded as 0,1,2 for homozygous, 
heterozygous and homozygous for the alternative allele. Frou,,,, 
is the individual inbreeding coefficient calculated from all auto- 
somes except for the chromosome containing the focal SNP. 
This ensures that the estimated effects are local ROH effects 
separate to the genome-wide inbreeding level. As before, 
we fitted region and sex as further fixed effects as well as 10 
principal components based on the variance-standardized addi- 
tive relationship matrix to account for relatedness within the 
sample. Lastly, we again included year of birth as a random 
effect. For each model, we extracted the model estimates for 
both ROH effects and their p-values calculated using Wald Z 


tests. We determined a genome-wide significance threshold by 
calculating the effective number of tests when accounting for 
linkage disequilibrium using ‘simpleM’ [32]. This led to an esti- 
mated 90900 independent tests, which we doubled (as two 
tests per model were performed), before using this value for a 
Bonferroni correction of p-values, leading to a genome-wide 
significance threshold of p< 2.75 x 107’. 


3. Results 


(a) Patterns of inbreeding in the genome 


and across time 
First, we evaluated ROH using 297 K SNP genotypes generated 
for n=8951 Thoroughbred horses born in EUR and ANZ 
between 1971 and 2020. On average, individual animals had 
415 (range 229-994) ROH segments greater than 300 kb with 
a mean length of 1.5 Mb (the longest ROH spanned 67 Mb) 
that covered approximately 28% of each genome (mean 
Fron = 0.28; range 0.18-0.40) (electronic supplementary 
material, figure S2). Out of these, on average 24 (range 0-49) 
were long ROH greater than 5 Mb and 391 (range 228-994) 
were short ROH less than 5 Mb, covering 8.7% and 19.5% of 
the genome, respectively. The frequency of ROH varied across 
the genome, including hotspots where up to 90% of horses 
have ROH and coldspots where ROH were rare (electronic sup- 
plementary material, figure 53), in line with high variation 
observed in other species [15,33]. Consistent with our previous 
report [5] we observed an increase in Froy in both EUR and 
ANZ over time (electronic supplementary material, figure 54). 


(b) Inbreeding effects on racing 

To test the hypothesis that inbreeding influences failure to 
race, we modelled the effects of inbreeding (Frox) on the 
probability of racing among n= 6128 horses using general 
linear mixed models. The log-odds of racing decreased 
significantly (p=0.0009) with increasing Froy (log-odds 
ratio (log(OR) [95% confidence interval (CI) = —-5.75 [—9.15, 
—2.36], electronic supplementary material, table 53), corre- 
sponding to a predicted decrease in the odds of racing by 
44% for a 10% increase in Fron. It is also possible to translate 
these effects into predicted probabilities of racing (figure 1). 
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Figure 2. Predicted probability (and 95% confidence intervals) of racing for different inbreeding coefficients Foy based on (a) long ROH greater than 5 Mb and (6) 
short ROH less than 5 Mb. (a)(i) and (b)(i) show predictions alongside raw data (horses that have raced at 100% and those that have not raced at 0%); (a)(ii) 


and (b)(ii) zoom closer into the relevant plotting area. (Online version in colour.) 
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Figure 3. GWAS of ROH effects on the probability of ever racing. The Manhattan plot shows p-values for ROH effects on the probability of racing at every SNP 
location across the genome. Per SNP, two effects of ROH were tested to differentiate between ROH containing the minor allele and ROH containing the major allele. 
The hit on chromosome 14 corresponds to a negative effect of ROH on the probability of racing. (Online version in colour.) 


According to the model, a horse with the highest observed 
inbreeding coefficient (FroH=0.40) had roughly a 13% 
lower probability of ever racing than a horse with the 
lowest observed inbreeding coefficient (FroH = 0.18). Horses 
that were 10% more inbred than average had a 7% lower prob- 
ability of ever racing, while horses with a Froy 10% lower than 
the mean had a 4% higher probability of racing (see electronic 
supplementary material, table S4 for numeric predictions). To 
test for differences in inbreeding depression between EUR 
and ANZ horses, we also fitted a model with an interaction 
between Frox and region (EUR and ANZ), but the slopes 
were very similar and the interaction was not significant 
(electronic supplementary material, figure S5 and table S5). 
In addition to modelling inbreeding effects on whether a 
horse had raced at all, we also fitted a model to test whether 
Fron is linked to a lower number of races among those 
horses that did race, but here Fro was not statistically 
significant (electronic supplementary material, table S6). 


(c) Inbreeding effects on racing by runs of 


homozygosity length 

Long ROH had a strong negative effect on the probability 
of racing, while short ROH had no effect (figure 2; electronic 
supplementary material, table S7), indicating that recent 
inbreeding rather than historic inbreeding is the cause of 
inbreeding depression for this trait in the population. The pre- 
dicted log-odds of racing decreased with increasing Frou tong 
(log(OR) [95% CI] = -6.57 [-10.04, —3.11]) corresponding to a 
predicted decrease in the odds of racing by 48% for a 10% 
increase in FRor jong: 


(d) Identification of a large-effect haplotype 

The GWAS uncovered a single significant hit (p = 9.09 x 10-*) 
for a ROH overlapping the minor allele A at SNP 
rs1144708552 at position 63883487 bp on ECA14 (log(OR) 
[95% CI]=-1.82 [-2.48, -1.15]; figure 3; electronic 


1902207 ‘682 J 005 “y 20d qdsi/jeusno{/610-burysi|qndya1os|eXou | 


Downloaded from https://royalsocietypublishing.org/ on 01 August 2023 


supplementary material, figure S6 and table $8). While only 
64 individuals (1%) were homozygous for the A allele and 
had overlapping ROH (ROH + AA), these horses had a 32% 
lower predicted probability of ever racing (electronic sup- 
plementary material, table S8) when keeping other factors 
such as Froy constant. The 64 ROH+AA horses were the 
progeny of 40 sires. 31.25% of ROH+AA horses were 
unraced, compared to the sample average 13.1%. Further- 
more, ROH + AA horses that did race had on average 13.3 
races, compared to the sample average of 16.3 races, 
suggesting that while some ROH+AA horses may race, 
they are less durable. Notably, genome-wide inbreeding 
(Fron) also showed a strong effect on the probability of 
racing in the same model (log(OR) [95% CI] = —6.39 [—9.88, 
—2.90]; electronic supplementary material, table $9). Conse- 
quently, inbreeding depression in the Thoroughbred is 
largely due to the genome-wide effects of many recessive 
deleterious mutations in addition to a larger effect recessive 
mutation expected to be in proximity to SNP rs1144708552. 

Using phased genotype data, we screened haplotypes 
surrounding the rs1144708552 SNP. Specifically, we focussed 
on haplotypes of length 1 Mb, including all SNPs occurring 
up to 500 kb on each side of rs1144708552. Among ROH + 
AA horses, 78.1% were homozygous for an identical haplo- 
type (electronic supplementary material, table S10), which 
we refer to as THR14. All but one ROH + AA horse carried 
at least one exact copy of THR14, and where haplotypes dif- 
fered from THR14, this was on average by only 2 out of 
84 base positions that make up the haplotype. THR14 was 
significantly (p=0.006) overrepresented among breeding 
stallions (n=130), where it had a frequency of 15.4% 
compared to 10.1% in the overall sample. 


(e) Candidate genes for survival/viability 

Since most (approx. 80%) of the ROH in the identified region 
extend across greater than 2.8 Mb we searched a 3 Mb region 
surrounding the rs1144708552 SNP (+1.5 Mb) for genes with 
biological functions that may impact on survival or viability 
of a horse. The most compelling candidate gene among the 
three protein-coding genes (FER, FBXL17 and EFNAS) in the 
region was EFNA5 (ECA14: 63365481-63631761 bp), which 
was also the closest (approx. 250 kb) gene to the rs1144708552 
SNP (electronic supplementary material, figure $7). 


4. Discussion 


Inbreeding in livestock populations is difficult to avoid due to 
the often closed nature of the gene pool. Nonetheless, limit- 
ing inbreeding is likely to reduce the impact of undesirable 
inherited features. Until now, the functional or clinical conse- 
quences of inbreeding in the Thoroughbred population were 
not known. Here we show, using SNP genotypes for a large 
cohort of Thoroughbred horses in two of the major global 
breeding regions, that inbreeding depression is a genome- 
wide phenomenon significantly impacting on the viability 
of a horse to ever race. Given the current pedigree structure 
of Thoroughbreds, avoiding breeding close relatives is chal- 
lenging and the increasing levels of inbreeding in the 
population indicates that existing strategies to mitigate 
inbreeding may be inadequate. Some breeders consider that 
the duplication of influential ancestral horses in a pedigree 
may be advantageous and there are numerous examples of 


close pedigree inbreeding that result in successful racehorses. JR 


Indeed, here we show that inbreeding in the distant pedigree, 
measured as Fron short, is not disadvantageous to the breeding 
goal. This observation is in agreement with an analysis of 
pedigree-based inbreeding in the Australian Thoroughbred 
population that suggested that the ancestral history coefficient 
of inbreeding, the number of times an allele has been identical 
by descent in an individual’s pedigree, has a positive associ- 
ation with racing performance and probably captures the 
effects of positive selection for favourable exercise-relevant 
traits over many generations [23]. However, more recently 
shared common ancestors, indicated by Fror_jong, have a 
considerable negative impact on the viability of a horse for 
racing and contribute to wastage in the population. Although 
not quantified here, it is likely that these long ROH contain 
a higher proportion of rare, deleterious alleles, which 
cumulatively cause the inbreeding depression observed. 

Large-effect mutations are expected to be more easily 
purged by selection [8]. However, here as well as describing 
the genome-wide architecture of inbreeding, we have 
identified a relatively common (10.1%) haplotype with an 
auxiliary effect on probability to race. This haplotype did not 
overlap any known locus under selection for Thoroughbreds 
[5,34-36]. Nonetheless, there was a higher than expected fre- 
quency of the THR14 haplotype among stallions, which 
occurs in one in six breeding stallions in our sample. Estimating 
from the frequency of the haplotype in our sample (10.1%), 
among the 8542 foals registered in Ireland in 2020, 87 are 
expected to be homozygous for the THR14 haplotype with 
1551 carriers. 

EFNAS (ECA14: 63 365 481-63 631 761 bp), which encodes 
the ephrin ligand, ephrin-A5, is a compelling candidate gene 
for the negative effect of the THR14 haplotype on racing. 
Ephrin A5 is a member of the ephrin family of ligands and 
receptors that are broadly expressed during tissue develop- 
ment and repair [37-42]. Ephrin A5 is involved in neonatal 
muscle development and regeneration, regulates cardiomyo- 
cytes and is associated with growth traits and digital cushion 
thickness (a strong predictor of lameness) in cattle [42-46]. 
Ephrin ligand-receptor signalling also has a well character- 
ised role in skeletal development and bone function 
[47-49]. Generally, ephrin A member ligands bind to ephrin 
A receptors; however, ephrin A5 is known to bind to the 
ephrin B2 receptor [50], which is encoded by a gene involved 
in osteogenic control and hip fracture in humans [51,52]. The 
related ephrin B1 protein is required for the fracture repair 
process [53]. Ephrin A5 has been proposed as a negative regu- 
lator of osteogenic differentiation [54] and is highly expressed 
in cartilage [55] where it participates in tissue growth and 
regeneration [56]. Although articular cartilage is a highly 
resilient tissue, it has a poor capacity for regeneration and 
articular cartilage damage in horses is one of the most 
common causes of catastrophic musculoskeletal injury [57]. 
Pre-existing degenerative articular cartilage and bone lesions 
have been commonly identified in euthanized horses post- 
injury [58] resulting from cumulative damage to the tissues 
from repetitive strain during exercise [59,60]. In this regard, 
the THR14 haplotype overlaps a large suggestive GWAS 
peak on ECA14 (14: 55530859-70816518) for osteochondrosis 
in the Belgian Warmblood breed [61]. Osteochondrosis is one 
of the most common skeletal diseases in the horse affecting car- 
tilage and bone, causing inflammation of joints, pain, lameness 
and a decrease in athletic performance. 
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There are many reasons that a foal may not become a 
viable racehorse [18-20]. However, musculoskeletal fractures 
are the most common cause of death at all stages of develop- 
ment, training and racing [19], and are a major concern for 
racehorse welfare. The known biological functions of 
EFNA5 and the haplotype association with the probability 
of racing that we report here lead to the hypothesis that it 
may play a role in musculoskeletal injury risk; however, 
this must be tested in a population of horses with well- 
defined phenotypes. While the nature of the hypothesized 
causative mutation on THR14 is not known, we provide 
here strong evidence for the presence of a mutation with a 
negative effect on racing located within this haplotype that 
has not previously been described. 

In summary, the introduction of genome-enabled breed- 
ing strategies to avoid the production of long homozygous 
stretches in the genome and homozygosity of the THR14 
haplotype could improve economic returns for breeders 
and positively impact animal welfare. Therefore, industry- 
guided monitoring of genome-wide inbreeding over time 
will be important to maintain sustainable populations of 
horses, with particular attention focused on higher resolution 
information that can be obtained for deleterious haplotypes 
such as THR14. 
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